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O ■ Abstract 

I For / a weighted voting scheme used by n voters to choose between two candidates, the n 

Shapley- Shuhik Indices (or Shapley values) of / provide a measure of how much control each 
voter can exert over the overall outcome of the vote. Shapley-Shubik indices were introduced by 
Lloyd Shapley and Martin Shubik in 1954 |SS54] and are widely studied in social choice theory 
as a measure of the "influence" of voters. The Inverse Shapley Value Problem is the problem 
■ of designing a weighted voting scheme which (approximately) achieves a desired input vector 

I of values for the Shapley-Shubik indices. Despite much interest in this problem no provably 

. correct and efficient algorithm was known prior to our work. 

O \ We give the first efficient algorithm with provable performance guarantees for the Inverse 

Shapley Value Problem. For any constant e > our algorithm runs in fixed poly(n) time (the 
degree of the polynomial is independent of e) and has the following performance guarantee: given 
^ . as input a vector of desired Shapley values, if any "reasonable" weighted voting scheme (roughly, 

one in which the threshold is not too skewed) approximately matches the desired vector of values 
. to within some small error, then our algorithm explicitly outputs a weighted voting scheme that 

ly-^ ' achieves this vector of Shapley values to within error e. If there is a "reasonable" voting scheme 

• . in which all voting weights are integers at most poly(n) that approximately achieves the desired 

^ ' Shapley values, then our algorithm runs in time poly(n) and outputs a weighted voting scheme 

, that achieves the target vector of Shapley values to within error e = ti"^/®. 

1 Introduction 

X 

^ ■ In this paper we consider the common scenario in which each of n voters must cast a binary vote for 
■ ■ ■ or against some proposal. What is the best way to design such a voting scheme? Throughout the 
paper we consider only weighted voting schemes, in which the proposal passes if a weighted sum of 
yes- votes exceeds a predetermined threshold. Weighted voting schemes are predominant in voting 
theory and have been extensively studied for many years, see |EGGW07l IZFBE08] and references 
therein. In computer science language, we are dealing with linear threshold functions (henceforth 
abbreviated as LTFs) over n Boolean variables. 

If it is desired that each of the n voters should have the same "amount of power" over the 
outcome, then a simple majority vote is the obvious solution. However, in many scenarios it may 
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be the case that we would hke to assign different levels of voting power to the n voters - perhaps 
they are shareholders who own different amounts of stock in a corporation, or representatives of 
differently sized populations. In such a setting it is much less obvious how to design the right voting 
scheme; indeed, it is far from obvious how to correctly quantify the notion of the "amount of power" 
that a voter has under a given fixed voting scheme. As a simple example, consider an election with 
three voters who have voting weights 49, 49 and 2, in which a total of 51 votes are required for the 
proposition to pass. While the disparity between voting weights may at first suggest that the two 
voters with 49 votes each have most of the "power," any coalition of two voters is sufficient to pass 
the proposition and any single voter is insufficient, so the voting power of all three voters is in fact 
equal. 

Many different power indices (methods of measuring the voting power of individuals under a 
given voting scheme) have been proposed over the course of decades. These include the Banzhaf 
index |Ban65j . the Deegan-Packel index [DP78| . the Holler index |Hol82j . and others (see the 
extensive survey of de Keijzer |dK08] ). Perhaps the best known, and certainly the oldest, of these 
indices is the Shapley-Shubik index |SS54j . which is also known as the index of Shapley values (we 
shall henceforth refer to it as such) . Informally, the Shapley value of a voter i among the n voters is 
the fraction of all n! orderings of the voters in which she "casts the pivotal vote" (see Definition [1] 
in Section [2] for a precise definition, and [RotSSj for much more on Shapley values). We shall work 
with the Shapley values throughout this paper. 

Given a particular weighted voting scheme (i.e., an n-variable linear threshold function), stan- 
dard sampling-based approaches can be used to efficiently obtain highly accurate estimates of 
the n Shapley values (see also the works of |Lee03[ IBMR"*" lo] ) . However, the inverse problem 
is much more challenging: given a vector of n desired values for the Shapley values, how can 
one design a weighted voting scheme that (approximately) achieves these Shapley values? This 
problem, which we refer to as the Inverse Shapley Value Problem, is quite natural and has re- 
ceived considerable attention; various heuristics and exponential-time algorithms have been pro- 
posed |APL07l IFWJOS^ IdKKZlO"! IKurllj , but prior to our work no provably correct and efficient 
algorithms were known. 

Our Results. We give the first efficient algorithm with provable performance guarantees for the 
Inverse Shapley Value Problem. Our results apply to "reasonable" voting schemes; roughly, we say 
that a weighted voting scheme is "reasonable" if fixing a tiny fraction of the voting weight does 
not already determine the outcome, i.e., if the threshold of the linear threshold function is not 
too extreme. (See Definition [2] in Section [2] for a precise definition.) This seems to be a plausible 
property for natural voting schemes. Roughly speaking, we show that if there is any reasonable 
weighted voting scheme that approximately achieves the desired input vector of Shapley values, 
then our algorithm finds such a weighted voting scheme. Our algorithm runs in fixed polynomial 
time in n, the number of voters, for any constant error parameter e > 0. In a bit more detail, our 
first main theorem, stated informally, is as follows (see Section [6] for Theorem [26] which gives a 
precise theorem statement): 

Main Theorem (arbitrary weights, informal statement). There is a poly{n)-time algorithm 
with the following properties: The algorithm is given any constant accuracy parameter e > and 
any vector of n real values a(l), . . . , a(n). The algorithm has the following performance guarantee: 
if there is any monotone increasing reasonable LTF f[x) whose Shapley values are very close to 
the given values a(l), . . . ,a{n), then with very high probability the algorithm outputs v G M", G M 
such that the linear threshold function h{x) = sign(t; ■ x — 9) has Shapley values e-close to those of 
/■ 

We emphasize that the exponent of the poly(n) running time is a fixed constant that is inde- 
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pendent of e. 

Our second main theorem gives an even stronger guarantee if there is a weighted voting scheme 
with smah weights (at most poly(n)) whose Shapley values are close to the desired values. For this 
problem we give an algorithm which achieves l/poly(n) accuracy in poly(n) time. An informal 
statement of this result is (see Section [6] for Theorem 1271 which gives a precise theorem statement): 

Main Theorem (bounded weights, informal statement). There is a poly{n,W)-time algo- 
rithm with the following properties: The algorithm is given a weight bound W and any vector of n 
real values a(l), . . . ,a{n). The algorithm has the following performance guarantee: if there is any 
monotone increasing reasonable LTF f{x) = sign(i(; ■ x — 6) whose Shapley values are very close to 
the given values a(l), . . . ,a{n) and where each Wi is an integer of magnitude at most W , then with 
very high probability the algorithm outputs v G M", ^ G M such that the linear threshold function 
h{x) = sign(u ■ X — 0) has Shapley values n"'^^^ -close to those of f . 

Discussion and Our Approach. At a high level, the Inverse Shapley Value Problem that 
we consider is similar to the "Chow Parameters Problem" that has been the subject of several 
recent papers |Gol061 fOSOSl [DDFS12j . The Chow parameters are another name for the n Banzhaf 
indices; the Chow Parameters Problem is to output a linear threshold function which approximately 
matches a given input vector of Chow parameters. (To align with the terminology of the current 
paper, the "Chow Parameters Problem" might perhaps better be described as the "Inverse Banzhaf 
Problem.") 

Let us briefly describe the approaches in |OS08j and |DDFS12] at a high level for the purpose 
of establishing a clear comparison with this paper. Each of the papers [OSOSj IDDFS12] combines 
structural results on linear threshold functions with an algorithmic component. The structural re- 
sults in |OS08j deal with anti-concentration of affine forms w-x — 9 where x € {—1, 1}"" is uniformly 
distributed over the Boolean hypercube, while the algorithmic ingredient of [OS08| is a rather 
straightforward brute-force search. In contrast, the key structural results of |DDFS12] are geomet- 
ric statements about how n-dimensional hyperplanes interact with the Boolean hypercube, which 
are combined with linear- algebraic (rather than anti-concentration) arguments. The algorithmic 
ingredient of [DDFS12] is more sophisticated, employing a boosting-based approach inspired by the 
work of [TTVOBl |Imp95| . 



Our approach combines aspects of both the |OS08] and [DDFS12] approaches. Very roughly 
speaking, we establish new structural results which show that linear threshold functions have good 
anti-concentration (similar to |OS08j ) , and use a boosting-based approach derived from |TTV08j as 
the algorithmic component (similar to |DDFS12] ). However, this high-level description glosses over 
many "Shapley-specific" issues and complications that do not arise in these earlier works; below 
we describe two of the main challenges that arise, and sketch how we meet them in this paper. 

First challenge: establishing anti-concentration with respect to non-standard distri- 
butions. The Chow parameters (i.e., Banzhaf indices) have a natural definition in terms of the 
uniform distribution over the Boolean hypercube {—1, 1}". Being able to use the uniform distri- 
bution with its many nice properties (such as complete independence among all coordinates) is 
very useful in proving the required anti-concentration results that are at the heart of |OS08j . In 
contrast, it is not a priori clear what is (or even whether there exists) the "right" distribution 
over {—1, 1}" corresponding to the Shapley values. In this paper we derive such a distribution ^ 
over {—1, 1}", but it is much less well-behaved than the uniform distribution (it is supported on 
a proper subset of {—1,1}'^, and it is not even pairwise independent). Nevertheless, we are able 
to establish anti-concentration results for affine forms w-x — 9 corresponding to linear threshold 
functions under the distribution // as required for our results. This is done by showing that any 
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reasonable linear threshold function can be expressed with "nice" weights (see Theorem [3] of Sec- 
tion [2|), and establishing anti-concentration for any "nice" weight vector by carefully combining 
anti-concentration bounds for p-biased distributions across a continuous family of different choices 
of p (see Section m for details). 

Second challenge: using anti-concentration to solve the Inverse Shapley problem. The 

main algorithmic ingredient that we use is a procedure from [TTVOSj . Given a vector of values 
(E[/(x)xj])j=i^,,,^„ (correlations between the unknown linear threshold function / and the individual 
input variables), it efficiently constructs a bounded function g : {— 1,1}"' — >• [—1,1] which closely 
matches these correlations, i.e., E[/(x)xj] w E[(7(x)3;j] for all i. Such a procedure is very useful for 
the Chow parameters problem, because the Chow parameters correspond precisely to the values 
^[f{x)xi] - i.e., the degree-1 Fourier coefficients of f - with respect to the uniform distribution. 
(This correspondence is at the heart of Chow's original proof |Cho61] showing that the exact values 
of the Chow parameters suffice to information-theoretically specify any linear threshold function; 
anti-concentration is used in |OS08j to extend Chow's original arguments about degree-1 Fourier 
coefficients to the setting of approximate reconstruction.) 

For the inverse Shapley problem, there is no obvious correspondence between the correlations of 
individual input variables and the Shapley values. Moreover, without a notion of "degree-1 Fourier 
coefficients" for the Shapley setting, it is not clear why anti-concentration statements with respect 
to fi should be useful for approximate reconstruction. We deal with both these issues by developing 
a notion of the degree-1 Fourier coefficients of f with respect to distribution // and relating these 
coefficients to the Shapley values 0. (We actually require two related notions: one is the "coordinate 
correlation coefficient" Exr~^^[f{x)xi], which is necessary for the algorithmic |TTV08j ingredient, and 
one is the "Fourier coefficient" f{i) = Ex^^i[f{x)Li], which is necessary for Lemma [T5l see below.) 
We define both notions and establish the necessary relations between them in Section [3l 

Armed with the notion of the degree-1 Fourier coefficients under distribution /i, we prove a key 
result (Lemma [T5]l saying that if the LTF / is anti-concentrated under distribution fi, then any 
bounded function g which closely matches the degree-1 Fourier coefficients of / must be close to / 
in ii distance with respect to /i. (This is why anti-concentration with respect to n is useful for us.) 
From this point, exploiting properties of the [TTVOSj algorithm, we can pass from g to an LTF 
whose Shapley values closely match those of /. 

Organization. Useful preliminaries are given in Section [2l including the crucial fact (Theorem [3]) 
that all "reasonable" linear threshold functions have weight representations with "nice" weights. 
In Section [3] we define the distribution fi and the notions of Fourier coefficients and "coordinate 
correlation coefficients," and the relations between them, that we will need. At the end of that 
section we prove a crucial lemma. Lemma \T5\ which says that anti-concentration of affine forms 
and closeness in Fourier coefficients together suffice to establish closeness in ii distance. Section [J] 
proves that "nice" affine forms have the required anti-concentration, and Section [5] describes the 
algorithmic tool from jTTVOSj that lets us establish closeness of coordinate correlation coefficients. 
Section [6] puts the pieces together to prove our main theorems. Finally, in Section [7] we conclude 
the paper and present a few open problems. 

^We note that Owen [Owe72j has given a characterization of the Shapley values as a weighted average of p-biased 
influences (see also [KS06j V However, this is not as useful for us as our characterization in terms of "/i-distribution" 
Fourier coefficients, because we need to ultimately relate the Shapley values to anti-concentration with respect to /i. 
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2 Preliminaries 

Notation and terminology. For n G we denote by [n] '= {1, 2, . . . , n}. For i, j G i < j, 

we denote [i,j] =^ + 1, . . . ,j}. 

Given a vector w = {wi, . . . ,Wn) G IR" we write \\w\\i to denote Yl7=i linear threshold 

function, or LTF, is a function / : { — 1, 1}" — )• { — 1, 1} which is such that f{x) = sign(i(; ■ x — 9) for 
some G M",6' G M. 

Our arguments will also use a variant of linear threshold functions which we call linear hounded 
functions (LBFs). The projection function Pi : M — t- [—1, 1] is defined by Pi{t) = t for \t\ < 1 and 
Pi{t) = sign(t) otherwise. An LBF g : { — 1, 1}" — s- [—1, 1] is a function g{x) = Pi{w ■ x — 6). 

Shapley values. Here and throughout the paper we write S„ to denote the symmetric group of 
all n! permutations over [n]. Given a permutation vr G S„, and an index i G [n], we write x{7r,i) to 
denote the string in { — 1, l}" that has a 1 in coordinate j if and only if 7r(j) < n{i), and we write 
x^{'jr,i) to denote the string obtained from x{Tr,i) by hipping coordinate i from —1 to 1. With this 
notation in place we can define the generalized Shapley indices of a Boolean function as follows: 

Definition 1. (Generalized Shapley values) Given f : {—1, 1}" — )■ {—1, 1}, the i-th generalized 
Shapley value of / is the value 



(where "vr ~/j S^" means that vr is selected uniformly at random from Sn)- 

A function / : { — 1, 1}" — )• { — 1, 1} is said to be monotone increasing if for all i G [n], whenever 
two input strings x,y £ {—1, 1}" differ precisely in coordinate i and have Xj = —1, yi = 1, it is the 
case that f{x) < f{y). It is easy to check that for monotone functions our definition of generalized 
Shapley values agrees with the usual notion of Shapley values (which are typically defined only for 
monotone functions) up to a multiplicative factor of 2; in the rest of the paper we omit "generalized" 
and refer to these values simply as the Shapley values of /. 



We will use the following notion of the "distance" between the vectors of Shapley values for 
two functions f,g : { — 1, 1}" — )■ [—1, 1]: 



i.e., the Shapley distance dshapicy (/> <?) is simply the Euclidean distance between the two n-dimensional 
vectors of Shapley values. Given a vector a = (a(l), . . . , a{n)) G we will also use (ishapicy(o5 /) 



The linear threshold functions that we consider. Our algorithmic results hold for linear 
threshold functions which are not too "extreme" (in the sense of having a very skewed threshold). 
We will use the following definition: 

Definition 2. (r/-reasonable LTF) Let f : {-1, 1}" {-1, 1}, /(x) = sign(w-x-6l) be an LTF. 
For < T] < 1 we say that f is 77-reasonable if 9 £ [— (1 — ?7)||w||i, (1 — 77)||u;||i]. 

All our results will deal with r/-reasonable LTFs; throughout the paper rj should be thought of 
as a small fixed absolute constant (such as 1/1000). LTFs that are not r/-reasonable do not seem 
to correspond to very interesting voting schemes since typically they will be very close to constant 



/(i)'=i^E_,s„[/(x+(vr,0) 



/(^(v^,^))] 



(1) 
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functions. (For example, even at r/ = 0.99, if the LTF f{x) = sign(xi + • • • + x„ — 0) has a threshold 
9 > which makes it not an r/-reasonable LTF, then / agrees with the constant function —1 on all 
but a 2~^('^) fraction of inputs in { — 1, l}"-) 

Turning from the threshold to the weights, some of the proofs in our paper will require us to 
work with LTFs that have "nice" weights in a certain technical sense. Prior work |Ser071 lOSllj 
has shown that for any LTF, there is a weight vector realizing that LTF that has essentially the 
properties we need; however, since the exact technical condition that we require is not guaranteed 
by any of the previous works, we give a full proof that any LTF has a representation of the desired 
form. The following theorem is proved in Appendix O 

Theorem 3. Let f : { — 1,1}'" — t- { — 1,1} be an rj-reasonable LTF and k € [2,n]. There exists a 
representation of f as f{x) = sign(t'o + SILi^*-^*) that (after reordering coordinates so that 
condition (i) below holds) we have: (i) \vi\ > i G [n — 1]; (ii) \vq\ < (1 — 'ri)Yll=i 1^*1'' '^^^ 

(Hi) for all i G [0, — 1] we have \vi\ < {2/rj) ■ ^/n ■ fca • fjfc, where ak '= \jYlj>k 

Tools from probability. We will use the following standard tail bound: 

Theorem 4. (ChernofF Bounds) Let X be a random variable taking values in [—a, a] and let 
Xi, . . . ,Xt be i.i.d. samples drawn from X. Let X = ^^^-i Xi/t. Then for any 7 > 0, we have 

Pr[|X-E[X]| >7] < 2exp(-72t/(2a2)). 

We will also use the Littlewood-Offord inequality for p-biased distributions over { — 1, 1}". One 
way to prove this is by using the LYM inequality (which can be found e.g. as Theorem 8.6 of 
[JukOl] ): for an explicit reference and proof of the following statement see e.g. |AGKW09] . 

Theorem 5. Fix 5 € (0, 1) and let Ds denote the (5-biased distribution over {—1, 1}" (under which 
each coordinate is set to 1 independently with probability 6.) Fix w G M" and define S = {i : \wi\ > 
e}. // l^l > K, then for allO we have Pr^^Ds[\w ■ x - e\ < e] < -j=i==. 

Basic Facts about function spaces. We will use the following basic facts: 

Fact 6. The n + 1 functions 1 linearly independent and form a basis for the subspace 

V = {f : {-1, 1}" and f is linear }. 

Fact 7. Fix any C { — 1, 1}" and let ^ be a probability distribution over O such that ^{x) > 

def 

for all X e VL. We define {f,g)f, = E^^f,[f{uj)g{uj)] for f,g : 9. -f M. Suppose that fi,...,fm ■ 
17 —7- M is an orthonormal set of functions, i.e., {fi^fj)^ = Sij for all i,j G [m]. Then we have 

{fJ)l > Y.T=iU.h)l- As a corollary, tf f,h : n ^ {-1,1} then we have .JfZJf^hMl < 

2^Vv.,^^[f{x)^h{x)\. 

3 Analytic Reformulation of Shapley values 

The definition of Shapley values given in Definition [1] is somewhat cumbersome to work with. In this 
section we derive alternate characterizations of Shapley values in terms of "Fourier coefficients" and 
"coordinate correlation coefficients" and establish various technical results relating Shapley values 
and these coefficients; these technical results will be crucially used in the proof of our main theorems. 

There is a particular distribution /i that plays a central role in our reformulations. We start by 
defining this distribution /i and introducing some relevant notation, and then give our results. 
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The distribution Let us define A(n) = Y2o<k<n i + Tl^i clearly we have A{n) = 0(logn), and 

more precisely we have A(n) < 21ogri. We also define Q{n, k) as Q{n, k) '= ^ + for < k < n, 

so we have A(n) = X]fc=i Qi''^^ k). 

For X S {—1, 1}" we write wt(x) to denote the number of I's in x. We define the set Bn to be 

Bn =^ {x G {-1, 1}" : < wt(x) < n}, i.e., B„ = {-1, 1}" \ {1, -1}. 

The distribution fi is supported on Bn and is defined as follows: to make a draw from fi, sample 
k £ {1, . . . , n — 1} with probability Q{n, k)/A{n). Choose x G { — 1, l}" uniformly at random from 

the A:-th "weight level" of {-1, 1}", i.e., from {-1, =^ {x G {-1, 1}" : wt(x) = k}. 

Useful notation. For i = 0, . . . , n we define the "coordinate correlation coefficients" of a function 
/ : { — 1, 1}" — ;> M (with respect to fi) as: 

ni)''^'E,^^[fix).Xi] (2) 

(here and throughout the paper xq denotes the constant 1). 

Later in this section we will define an orthonormal set of linear functions Lq, Li, . . . , Ln : 
{— 1, l}" — )• M. We define the "Fourier coefficients" of / (with respect to fi) as: 

fii)''^'B,^,[f{x)-Li{x)]. (3) 

An alternative expression for the Shapley values. We start by expressing the Shapley values 
in terms of the coordinate correlation coefficients: 

Lemma 8. Given f : {—1, 1}" — )• [—1, 1], for each i = 1, . . . ,n we have 

n 2 y nj^^ J 

or equivalently, 

f*C\ 2 f,,.. fil)-f{-l) \ , 1 ^ 

Proof. Recall that /(i) can be expressed as follows: 

f{i) = E^^^sJ/(x+(7r,i)) - /(x(vr,i))]. (4) 

Since the i-th coordinate of x~^{Tr,i) is 1 and the i-th coordinate of x{Tr,i) is —1, we see that f{i) 
is a weighted sum of {/(2;)a^i}a;6{-i,i}"- We now compute the weights associated with any such 
xG{-l,ir. 

• Let X be a string that has wt(a;) coordinates that are 1 and has Xi = 1. Then the total number 
of permutations vr G §n such that x~^{7r,i) = x is (wt(x) — l)!(n — wt(x))!. Consequently the 
weight associated with f{x)xi for such an x is (wt(x) — 1)! • {n — wt(3;))!/n!. 

• Now let x be a string that has wt(x) coordinates that are 1 and has xi = —1. Then the total 
number of permutations tt G such that x{tt, z) = x is wt(x)!(n — wt(x) — 1)!. Consequently 
the weight associated with f[x)xi for such an x is wt(x)! • {n — wt(x) — l)!/n!. 
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Thus we may rewrite Equation ^ as 

m = 



^ (wt(x)-l)!(n-wt(x))! 



x:{-l,l}":Xi = l 

E 

a;:{-l,l}":Xi=-l 



n! 



Ewt(x)!(n-wt(x)-l)! 
n /(^) ■ 



Let us now define =^ (/(I) — Using the fact that = 1, it is easy to see that one 

gets 



2/(i) = 2uif) + 



(wt(x) - l)!(n - wt(x) - 1)! 



n! 



((n/2 - wt(x)) + (nxi)/2) 



(n-1)! 

/(X) . (wt(x)-l)Kn -wtW-l). , _ ^^^^^.,1 



2''(/) + E ( 



ni 



n 



wt(2;)(n- wt(x))y^.); 

1 



• Xi + 



wt(x)(n-wt(x))(^,y 



^•(n-2wt(x)) . 



(5) 



We next observe that n — 2wt(x) = — xj). Next, let us define P{n, k) (for k £ [1, n — 1]) as 
follows : 



P{n,k) 



dclQ{n,k) i + ^ 



X&Bn 



x&B„ 



f{x)-P{n,wt{x))-{Zxi)/n 



So we may rewrite Equation ^ in terms of P(n, wt(x)) as 
2/(z) = 2i.(/) + [fix) ■ X, ■ P(n, wt(x))] - 

We have 

72— 1 li — i. / \ It — J. 

xeBn fc=la;e{-l,l|", fc=l ^ ^ fc=l 



n-1 



and consequently we get 



2/(i) = 2z.(/)+A(n)- E [f{x)-Xi]- E 



f{x) ■ (Y^ Xi)/n 

i=l 



finishing the proof. 



□ 



Construction of a Fourier basis for distribution ^. For all x £ Bn we have that /u(x) > 0, and 
consequently by Fact [6] we know that the functions 1, xi, . . . , Xn+i form a basis for the subspace of 
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linear functions from Bn — )■ M. By Gram- Schmidt orthogonalization, we can obtain an orthonormal 
basis Lq, . . . , Ln for this subspace, i.e., a set of hnear functions such that {Li, Li)^ = 1 for all i and 
{Li,Lj)^ = for all i / j. 

We now give explicit expressions for these basis functions. We start by defining Lq : Bn — ^ 
as Lq : 2; I— 7- 1. Next, by symmetry, we can express each Lj as 

Li{x) = a{xi + . . . + Xn) + (3xi. 

Using the orthonormality properties it is straightforward to solve for a and /3. The following Lemma 
gives the values of a and /3: 

Lemma 9. For the choices 



defl / / A(n) VMn) \ dg y/W) 

" n' \\l nK{n)-A{n-l) 2 J' ^ 2 ' 

the set {Li}^^Q is an orthonormal set of linear functions under the distribution fi. 



We note for later reference that a = —Q [ ) and (3 = Q{\/\og n). 



We start with the following proposition which gives an explicit expression for Ex^^l^^iXj] when 
i ^ j; we will use it in the proof of Lemma [9l 

Proposition 10. For all 1 < i < j < n we have E^^^[xjXj] = 1 — j;^- 

Proof. For brevity let us write Af. = {—1,1}=^, i.e., A)^ = {x £ {—1,1}" : wt(x) = k}, the fe-th 
"slice" of the hypercube. Since fi is supported on i?„ = U^~Jylfc, we have 



^x^fi[xiXj] = E [xiXj \ X £ Ak]- Prx^^[x G Ak]. 

If /c = 1 or n — 1, it is clear that 



0<k<n 



2 2 4 

Ea;~»[XjX,- \ X £ Ak] = l = 1 , 

n n n 



and when 2 < /c < n — 2, we have 



E..,[..x, I x£A,]=i,. (2(1 : 2) + D - il 



\k) 

Recall that A(n) = X]o<fc<n \ + TT^i ^'^'^ Qi'n, k) = ^ + ^^^^ for < < n. This means that we 
have 

Prx-^t^ e M = Q{n,k)/K{n). 

Thus we may write 'Eix^p\xiXj\ as 

r 1 \ — ^ ^) -n r I A 1 

^xr^fi[XiXj\ — / ^ T""7 N ■ ^x'^ii[XiXj I X G ^fcj ~l~ 

2<fc<n-2 ^ ^ 



EQ{n,k) -p, r 
77 \ ' ^xr~Jij,[XiX j X £ Af^L 
1 A(n) 
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A(re) J I ^^^^^ y ^ J 



n I n — 1 



For the latter sum, we have 

Q{n,k) ^ ^ I „ ^ 4 1 _ 1 2n 

fce{l,n-l} 

For the former, we can write 

n-2 

( 

" A(n) 



fc=2 

n-2 



^ 1 (A;-l)!(n-A;-l)! / (n-2\ [n-2\ fn 



k=2 
n-2 



A{n) (n-1)! \ \k - 2 J \ k J \k 



y— 



2{k - 1) ^2{n-k-l) n 



f^_^Mn) \{n-l){n-k) {n - l)k k{n - k) 



n-2 



T — 



2 2 2 

+ 



Kin) \n — k n — 1 k n — 1 k n — k 



n-2 



y— 

^ Mr) 



1 1 
+ T 



A(n) \n — k k n — 1 
k=2 ^ ' ^ 



Thus, we get that overah E^.^,, [ ] equals 

l\ 2n 1 1^ 1 ^ 1 4 \ 

i) n — 1 ^ A(n) \n — k k n — 1 J 



k=2 

/n-2 



A(n) \ njn 

1 1^2 + ^ +^ /y^l ^ 1 \ 4 ^ 

A(n) V n-1 n-ly A(n) n-k) A(n) 



A(?i) If^^''"'"'/ A(n) ^ A(n)' 
as was to be shown. □ 



Proof of Lemma O We begin by observing that 

B,^^^[Li{x)Lo{x)] = E.^r^^[Li{x)] = E^r^^[a{xi + . . . + x„) + pXi] = 

since E3;^^[xi] = 0. Next, we solve for a and /? using the orthonormality conditions on the set 
{Li}^^^. As E^^f,[Li{x)Lj{x)] = and Ex^^[Li{x)Li{x)] = 1, we get that E^^^[Li(x)(Li(x) - 
Lj{x))] = 1. This gives 

Ea;~^[Li(x) • {Li{x) - Lj{x))] = E^^^[Li{x) ■ f3{xi - xj)] 

= Bxr^^[/3{{a + I3)xi + axj) ■ {xi - Xj)] 

= a/3 + /32 -a/3-/32E^^^[xjXi] 

= /3\l - E,^^[x,xj]) = 4/3VA(n) = 1, 
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where the penultimate equation above uses Proposition [TOj Thus, we have shown that j3 = . 
To solve for a, we note that 

n 

Li{x) = {an + I3){xi + ...+ Xn). 

1=1 

However, since the set {Li}^^-^ is orthonormal with respect to the distribution fi, we get that 
B^^^[{Li{x) + ... + Ln{x)){Li{x) + ... + Ln{x))] = n 

and consequently 

{an + Ea;^^[(xi + . . . + Xn){xi + . . . + Xn)] = n 
Now, using Proposition [TOl we get 

n 

Bxr^p[{xi + . . . + Xn){xi + . . . + Xn)] = ^ E^;^^ [xf ] + ^ E^^^ [XjXj] 

i=l i^j 

4 



Thus, we get that 



n + n{n - 1) • 1 . , , 
^ ' ^ A(n) 



{an + fif ■ [n + n{n - I) ■ [l - ] ] = n. 



Simplifying further, 

I AH 



{an + j3) 
and thus 



nA(n) - 4(n - 1) 



^^1(1 A(n) Vac 



n 



n \V nA{n) - 4{n - 1) 2 / 

as was to be shown. □ 

Relating the Shapley values to the Fourier coefficients. The next lemma gives a useful 
expression for f{i) in terms of f{i)- 

Lemma 11. Let f : { — 1, 1}" — )■ [—1, 1] be any bounded function. Then for each i = 1, . . . ,n we 

have 

2/3 f ~ f{l)-f{-l) \ , 1 " ^ 
Proof. Lemma [9] gives us that Li{x) = a{xi + . . . + Xn) + (3xi, and thus we have 

f{i) = E,^^[/(x) • L,(x)] = a (j2E,^f^[f{x) ■ xj]^ + /3E,^^[/(x) • x,] 

n 

= a^f*{j) + (3f*{i). (6) 
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Summing this for f = 1 to n, we get that 



3=1 i=i 



Plugging this into dS]), we get that 



(7) 



a 



an + B ^ 

.7 = 1 



Now recah that from Lemma [8l we have 

A(n) 



2 

A(n) 



E [/(x)-xi]- E 



f{x) ■ (Y, Xi)/n 

i=l 



n 



where v{f) = (/(I) — /(— l))/n. Hence, combining the above with ([7]) and we get 

2 



r^'W \h ^ 

p \ an + p 



1 

^-^ I A{n) n{an + p) ^ 



From this, it follows that 



an + /3 \n /3 



and hence 



as desired. 



2/3 



•(/«-K/)) + ^-E/(j) 



(8) 



□ 



Bounding Shapley distance in terms of Fourier distance. Recall that the Shapley distance 
dshapieyC/,^) between f,g : {-1, 1}" [-1, 1] is defined as (ishapiey(/, 5) == \IYa=iU'S) " 9{i)Y- 

We define the Fourier distance between / and g as dFouricr(/, 5) =^ \l Yll=o{f i''^) ~ di^))'^- 

Our next lemma shows that if the Fourier distance between / and g is small then so is the 
Shapley distance. 

Lemma 12. Let f,g : {-1, 1}" [-1, 1]. Then, 

4 A(n) 

dshaplcy(/, 5) < ^ + ■ C?Fouricr(/,5)- 

y n zp 



Proof. Let = (/(I) — /(— l))/n and 7^(51) = (g(l) — g{—l))/n. From Lemma [TT\ we have that 
for all 1 < -i < n. 
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Using a similar relation for g, we get that for every 1 < i < n, 

A(n) ( f E"=i/(j) E]=i9U)\ , , , ~r\ 

~W ' [ ^^^^ n ^^'^ ^ J ''^■^^ ~ ""^^^ " •^'■'^ ~ 

We next define the following vectors: let v S be defined by Vi = f{i) — g{i), i E [n] (so our goal 
is to bound ||f ||2)- Let u G M" be defined by Ui = v{f) — vig), i G [n]. Finally, let w € M"' be 
defined by 

Wi = fit) git) + — , i G [n]. 

\ n n I 

With these definitions the vectors u, v and w satisfy ^^^^ ■ w + u = v, and hence we have 

A(n) „ „ 

\\v\\2 < imh + • \\wh- 

Since the range of / and (7 is [— 1 , 1] , we immediately have that 

V / v'T' 

so all that remains is to bound 111^112 from above. To do this, let us define another vector w' € 
by w'^ = f{i) — g{i). Let e G denote the unit vector e = {l/^/n, . . . , 1/ ^/n). Letting w'^ denote 
the projection of w along e, it is easy to see that 



n ' ' n 



This means that w = w' — w'^ and that w is the projection of w' in the space orthogonal to e. 
Consequently we have \\w\\2 < ||if'||2i and hence 

„ „ 4 A(n)„ ,„ 
V 2 <— + ^ w' 2 
y/n 2(3 

as was to be shown. □ 



Bounding Fourier distance by "correlation distance." The following lemma will be useful 
for us since it lets us bound from above Fourier distance in terms of the distance between vectors 
of correlations with individual variables: 

Lemma 13. Let f,g : {—1, 1}" M. Then we have 

y i=o 

Proof. We first observe that /(O) = /*(0) and ^(0) = g*{0), so (/(O) - ^(O))^ = (/*(0) - g*{0))^. 
Consequently it suffices to prove that 

n 

Y^{f*{^)-g*{i)f, 
1=1 



n 



13 



which is what we show below. 
From ([6|), we get 



/(i) = a^r(i)+/3r(i) and g{i) = aY^o'^ij) + f^9*ii)- 



and thus we have 

(n n \ 

i=i i=i / 

Now consider vectors u,v,w S where for i £ [n], 

(n " \ 

By combining the triangle inequality and Cauchy-Schwarz, we have 

\\ug<2{a^\\vg+/3^w\\l), 

and moreover 

(n n \ / n \ 

Hence, we obtain 

||n||^ < 2{a^n^ + ^^)\\w\\l 
Recalling that a^n^ = 0(logn) and = 0(logn), we conclude that 



dFouricr(/, g) 



\ 1=1 \ 1=1 

which completes the proof. □ 

From Fourier closeness to £i-closeness. An important technical ingredient in our work is the 
notion of an affine form l{x) having "good anti-concentration" under distribution /x; we now give 
a precise definition to capture this. 

Definition 14 (Anti-concentration). Fix w € and G M, and let the affine form £{x) be 

def 

£{x) = w ■ X — 9. We say that i{x) is (5, K)-anti-concentrated under fi if Prx^^[\£{x)\ < S] < k. 

The next lemma plays a crucial role in our results. It essentially shows that for / = sign(tt;-x— 
if the affine form i(x) = w ■ x — 9 is anti-concentrated, then any bounded function g : { — 1, 1}" — )• 
[— 1, 1] that has dpouricrif, g) small must in fact be close to / in ii distance under fi. 

Lemma 15. Let f : { — 1, 1}" — > {—1, 1}, / = sign(tt; ■ x — 9) be such that w ■ x — 9 is (6, K)-anti- 
concentrated under fj, (for some k < 1/2), where \9\ < Let g : {—1, 1}" — s- [—1, 1] be such that 

dFovLvicrif,g) < P- Then we have 

^x^^\f{x) - g{x)\] < {4\\w\\i^)/5 + 2k. 
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def 



Proof. Let us rewrite i{x) = w ■ x — 9 as a linear combination of the orthonormal basis elements 
Lo,Li, . . . ,in (w.r.t. /i), i.e., 

n 

i{x) = mLo+Y,hi)Li. 

1=1 

Recalling the definitions of Li for i = I, . . . ,n and the fact that Lq = 1, we get ^(0) = —9. 
We first establish an upper bound on 9'^ + ^(j)^ follows : 



9' + Y,iijf = B,^,[i 
i=i 



wx-9f] < 2E^.^^[(u;-x)2] + 2^2 



< 2||ti;||? + 2||u'||? = 4||u;||?. 



The first equality above uses the fact that the Lj's are orthonormal under /i, while the first inequality 
uses (a + 6)^ < 2(a^ + 6^) for a, 6 € M. The second inequality uses the assumed bound on \9\ and 
the fact that • x| is always at most ll^wlli. 
Next, linearity of expectation gives us that 



B,^^[{f{x)-g{x))-{wx-9)] = 0(^(O)-/(O)) + J]£(i)(/(i)-5(z)) 



< 



E(/(j')-5(j))^ 

< 2||tt;||iv^ 



(9) 



where the first inequality is Cauchy-Schwarz and the second follows by the conditions of the lemma. 
Now note that since / = sign('u; ■ x — 9), for all x G {—1, 1}" we have 

{f{x) - g{x)) ■{wx-9) = \ f{x) - g{x)\ ■\wx-9\ 

Let E denote the event that \w ■ x — 9\ > 5. Using the fact that the affine form w ■ x — 9 is 
((5, K)-anti-concentrated, we get that Pr[i?] > 1 — k, and hence 

E^^f,[{f{x)-g{x))-{wx-9)] > E,^^[{f{x)-g{x))-{wx-9)\ E]Fr[E] 

> S{l-K)E^^^[\f{x)-g{x)\ I E]. 

Recalling that k < 1/2, this together with ([9]) implies that 



B,^^[\f{x)-g{x)\ \E]< 



5 



which in turn implies (since |/(x) — g{x)\ < 2 for all x G { — 1, 1}*^) that 



E,^^[\f{x)-g{x)\]< 



M\M\iVp 



+ 2k 



as was to be shown. 



□ 
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4 A Useful Anti-concentration Result 



In this section we prove an anti-concentration result for monotone increasing r/-reasonable afRne 
forms (see Definition under the distribution /i. Note that even if /c is a constant the result gives 
an anti-concentration probability of 0(1/ log n); this will be crucial in the proof of our first main 
result in Section O 

Theorem 16. Let L{x) = wq + X^^Li "WiXi be a monotone increasing rj-reasonable affine form, so 
vJi > for i G [n] and \wq\ < (1 — ry) ^11=1 l^d • A; € [n], < C < 1/2, k > 2/rj and r S M+ he 
such that \S\ > k, where S := {i £ [n] : \wi\ > r}. Then 

Pr_DiWI<^l=0(j^^^^(i + i)). 

This theorem essentially says that under the distribution fi, the random variable L{x) falls in 
the interval [— r, r] with only a very small probability. Such theorems are known in the literature 
as "anti-concentration" results, but almost all such results are for the uniform distribution or for 
other product distributions, and indeed the proofs of such results typically crucially use the fact 
that the distributions are product distributions. 

In our setting, the distribution fi is not even a pairwise independent distribution, so standard 
approaches for proving anti-concentration cannot be directly applied. Instead, we exploit the fact 
that ;U is a symmetric distribution; a distribution is symmetric if the probability mass it assigns to 
an n-bit string x S {—1, 1}" depends only on the number of I's of rr (and not on their location within 
the string). This enables us to perform a somewhat delicate reduction to known anti-concentration 
results for biased product distributions. Our proof adopts a point of view which is inspired by 
the combinatorial proof of the basic Littlewood-Offord theorem (under the uniform distribution 
on the hypercube) due to Benjamini et. al. |BKS99] . The detailed proof is given in the following 
subsection. 



4.1 Proof of Theorem 

Recall that {—1, 1}"^ denotes the i-th "weight level" of the hypercube, i.e., {x e {—1, l}" : wt{x) = 
i}. We view a random draw x ~ ^ as being done according to a two-stage process: 

1. Draw z G — 1] with probability q{n,i) '= Q{n,i)/ A{n). (Note that this is the probability /i 
assigns to {—1, l}=j-) 

2. Independently pick a uniformly random permutation vr : [n] — t- [n], i.e., vr S„. The string 
X is defined to have ^^^(i) = . . . = = 1 and = . . . = x^(^n) = — 1. 

It is easy to see that the above description of n is equivalent to its original definition. Another 
crucial observation is that any symmetric distribution can be sampled in the same way, with q{n, k) 
being the only quantity dependent on the particular distribution. We next define a (r, i)-balanced 
permutation. 

Definition 17 ((r, i)-balanced permutation). A permutation vr : [n] — )■ [n] is called {r,i) -balanced 
if \wo + Ej=i w^ij) - E]=i+i < r. 

For i £ [n—1], let us denote by p{r, i) the fraction of all n! permutations that are (r, i) balanced. 
That is. 



p{r,i) = Pr^^^g 



j=l j=i+l 
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At this point, as done in |BKS99j . we use the above two-stage process defining to express the 
desired "small ball" probability in a more convenient way. Conditioning on the event that the z-th 
layer is selected in the first stage, the probability that < r is p{r,i). By the law of total 

probability we can write: 

Pr^;^^ <r]= Yl p{r,i)qin,i). 

i=l 

We again observe that p{r, i) is only dependent on the affine form L{x) and does not depend on 
the particular symmetric distribution; q{n, i) is the only part dependent on the distribution. The 
high-level idea of bounding the quantity is as follows: For i which are "close 

to 1 or n — 1", we use Markov's inequality to argue that the corresponding p(r, i)'s are suitably 
small; this allows us to bound the contribution of these indices to the sum, using the fact that each 
q{n,i) is small. For the remaining i's, we use the fact that the pj's are identical for all symmetric 
distributions. This allows us to perform a subtle "reduction" to known anti-concentration results 
for biased product distributions. 

We start with the following simple claim, a consequence of Markov's inequality, that shows that 
if one of z or n — 2 is reasonably small, the probability p{r, i) is quite small. 

Claim 18. For all i G [n — 1] we have 

p{r, i) < {4:/r]) ■ min{i, n — i}/n. 

Proof. For i E [n - 1], let £i = {tt e Sn ■ \wo + Y.)=i ^7r(j) - Y^j=i+i ""^ttO)! < r}. By definition we 
have that p{r, i) = Pr^r^^Sn [£i]- 

Let i < n/2. If the event £i occurs, we certainly have that + X]}=i ^7r(j) ~ Sj=i+i ^vrQ) ^ "'^ 
which yields that 

i n 

E w'ttO) > (1/2)(E Wi-r - Wo). 
j=i i=i 

That is, 

i n 

Yw^(^j)>{l/2){Y^Wi-r-wo) . 
_j=i i=i 

Consider the random variable X = Yl]=i ^7r(j) denote a =^ (l/2)(Er=i '^i ~ ~ '"^o)- We will 
bound from above the probability 

Pr.^HS. [X>a]. 

SincG TT is cliosGn uniformly from S77,, we have that E^r^^s^ = (1/n) • Yll=i hence 

n 
i=l 

Recalling that \wq\ < (1 — 77) • EILi ^« ^^^^ noting that Y17=i — EieS Wi > kr > (2/77) • r, we get 

n 

« > (^/4) • E 

1=1 

Therefore, noting that X > 0, by Markov's inequality, we obtain that 

Pr..,s„ [X>a]< ^^-^ < (4/,?) • (i/n) 

as was to be proven. 



p{r, i) < Pr^^^s, 
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If i > n/2, we proceed analogously. If £i occurs, we have wq + E}=i ^7r(j) ~ Yl^=i+i ^ ^ 
which yields that 

n n 
j=i+l i=l 

We then repeat the exact same Markov type argument for the random variable Ej'=j+i "^ttU)- This 
completes the proof of the claim. □ 

Of course, the above lemma is only useful when either i or n — i is relatively small. Fix iq < n/2 
(to be chosen later). Note that, for all i < n/2, it holds q{n,i) < j:x(n)- Claim [T8l we thus get 
that 

'0 ^0 2 4 i 8in 

E P(r, i)q{n, i) < Z < (10) 

By symmetry, we get 

E p{r,i)qin,i) < (11) 

i=n-io V-n- A(n) 

We proceed to bound from above the term "^^Zlg^i p{''',i)q{n,i). To this end, we exploit the 
fact, mentioned earlier, that the p(r, i)'s depend only on the affine form and not on the particular 
symmetric distribution over weight levels. We use a subtle argument to essentially reduce anti- 
concentration statements about fi to known anti-concentration results. 

For 5 G (0, 1) let Ds be the (^-biased distribution over {—1, 1}" ; that is the product distribution 
in which each coordinate is 1 with probability 5 and —1 with probability 1 — 6. Denote by g{5,i) 
the probability that assigns to { — 1, l}"j, i.e., g{6,i) = (")(5*(1 — (5)"~*. Theorem 5 now yields 

Pr^^Ds mx)\<r]< \ 

^Jk5{l - 5) 

Using symmetry, we view a random draw x ~ as a two-stage procedure, exactly as in /x, the 
only difference being that in the first stage we pick the i-th weight level of the hypercube, i € [0, n], 
with probability g{5,i). We can therefore write 



Pr^^^Ds [\L{x)\ <r\ = Y. g{6,i)p{r,i) 

i=0 



and thus conclude that 



E giS, i)p{r, i)<Z 9iS, i)pir, i) < ■ (12) 

i=jo+l i=0 ■\/kd[l—dj 

We now state and prove the following crucial lemma. The idea of the lemma is to bound from above 
the sum Er=io'+i^ ^)^('^' ^) suitably averaging over anti-concentration bounds obtained from 
the (5-biased product distributions: 

Lemma 19. Let F : [0,1] — t- be such that q{n,i) < J^^^^ F{6)g{6,i)d5 for all i €z [iQ + l,n—iQ — l]. 
Then, 



E p{r,i)q{n,i)<^. / -j=^dd. 
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Proof. We have the fohowing sequence of inequahties 

n—io—l n—l—io / /"l 

E : 

i=io+l 



-io-1 n-l-io / /■! \ 

J2 P{r,i)q{n,i) < E / F{6)g{5,i)d6 \ ■ p{r,i) 

=io+l i=io+l \J5=0 / 



(n-io-1 \ 
E gi5,i)p{r,i)]d5 
i=io+l J 



Vk Js=o y/6{l - 6) 

where the first Hne follows from the assumption of the lemma, the second uses linearity and the 
third uses ([121). □ 



We thus need to choose appropriately a function F satisfying the lemma statement which can 
give a non-trivial bound on the desired sum. Fix C > 0, and define F{5) as 

defl024 (n + l)V2K/ 1 
= TTTT T75Z? TTTw + 



A(n) \6^/^"<^ (l-5)i/2-C 

The following claim (proved in Section [4.2p says that this choice of F{6) satisfies the conditions of 
Lemma [191 

Claim 20. For the above choice of F{5) and io <i <n — i^, q{n,i) < f^^^ F{S)g{5,i)d5. 
Now, applying Lemma [T9l for this choice of F{6), we get that 

n—iQ—l 

p{r,i)q{n,i) 

i=io+l 

_1_ 1024 jn + 1)V2+C fi / 1 1 \ 1 

- ■ A(n) ■ U^/^-^ ^ (1 - J 7^(1^ 

^ A J ^ (n + l)V2+C \ 

l^C'v^'A(n)" ,1/2+C J- 

We choose (with foresight) io = [^373! • Then the above expression simplifies to 

n-io-l /II 1 

E Pir,i)qin,i) = O 



Now plugging io = [^ttsI pOj) and ([TTl) . we get 

i<ioVi>n-io \ i \ J / 

Combining these equations, we get the final result, and Theorem 1161 is proved. □ 
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4.2 Proof of Claim [20] 

We will need the following basic facts : 

Fact 21. For x,y & M+ let T : — ^ M 6e the usual "Gamma" function, so that 

r i^(i-.)».i=r(x + i).rfa + i) 
A=o r(i + j/ + 2) 

Recall that for z € Z+, r(z) = (z — 1)!. 

Fact 22. (Stirling's approximation) For z G IR+, we have T{z) = ■ (f)^ • + . In 

particular, there is an absolute constant cq > such that for z > cq 

Fact 23. For x G M and x>2, we have (l - i)"" > j. 

We can now proceed with the proof of Claim [20l We consider the case when io < i < n/2. 
(The proof of the complementary case (n — io — 1 > i > n/2) is essentially identical.) We have the 
following chain of inequalities: 

-1 



F{5)g{6,i)d5 

5=0 

1024 (7i + l)V2+C 



n 



A(n) ,1/2+C 75=0 ' ' V-J^/^-^ (l-5)i/2-C 

1024 (ji + 1)V2+C /n 



J 5=0 



2 AM^^-TT^'ii'^' 

1024 (7i + i)i/2+c r(n-i + i) •r(i + i/2 + c) , . ^ 

— l usmg lact 121 



A(n) ^i/2+C r(7i + 3/2 + C) 

1024 (n + l)V2+C r(n + l) r(n - i + 1) • r(i + 1/2 + C) 

A(n) ^i/2+C r(i + l)-r(n-i + l) r(n + 3/2 + C) 

1024 (n + l)i/2+C r(n + l)-r(i + l/2 + C) 

A(^ ^i/2+C r(i + l) •r(n + 3/2 + C) 

We now proceed to bound from below the right hand side of the last inequality. Towards that, 
using Fact [22] and assuming n and i are large enough, we have 

r(n + i) •r(i + i/2 + c) 
r(i + 1) • r(n + 3/2 + c) 



> 



> 



16 (i + l)*+i/2 (n + 3/2 + C)"+^+i 
1 1 (n + l)"+V2 + + 



16 n + 2 (i + l)i+i/2 (n + 3/2 + C)"+^ 
1 1 (71 + 1)"+^ (i + 1/2 + 0^+^ (?i+l)V2-C 

16 ' n + 2 " (n + 3/2 + C)"+'^ (7+T)^+C (i + l)i/2-C 



^ 1 1 (n+l)i/2-C ^11 1 



256 n + 2 (f + l)i/2-C - 512 (n + l)i/2+C (i + i)i/2-C 
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Plugging this back, we get 

-1 

F{5)g{6,i)d5 > 

<5=0 

> 



1024 (n + l)V^+^ r(n + 1) •r(z + 1/2 + C) 
-i^^ r(i + 1) • r(ri + 3/2 + C) 
1024 (n + l)V2+C 1 1 1 



A(n) 512 (n + l)V2+C (i + i)i/2-C 

2 1 1 2 1,, 



A(n) ii/2+C (i + l)i/2-C - A(n) i 
which concludes the proof of the claim. □ 

5 A Useful Algorithmic Tool 

In this section we describe a useful algorithmic tool arising from recent work in computational 
complexity theory. The main result we will need is the following theorem of |TTV08| (the ideas go 
back to |Imp95| and were used in a different form in |DDFS12] ): 

Theorem 24. ( ITTVOS^ ) Let X be a finite domain, fi be a samplable probability distribution 
over X , f : X — >• [—1,1] be a bounded function, and C be a finite family of Boolean functions 
^ : X — 7- {—1,1}. There is an algorithm Boosting-TTV with the following properties: Suppose 
Boosting-TTV is given as input a list {a^)ii^ji of real values and a parameter ^ > such that 
I '^xr~^^i[f{x)(.{x)] — a(\ < for every £ G C. Then Boosting-TTV outputs a function h : X 
[— 1, 1] with the following properties: 

(i) I E^^^[l{x)h{x) - i{x)f{x)]\ < i for every I G C; 

(ii) h{x) is of the form h{x) = • Yleec''^^^(^)) where the wi's are integers whose absolute 
values sum to 0(1/^^). 



The algorithm runs for 0(1/.^^) iterations, where in each iteration it estimates 'Eix,^ix[h' {x)l{x)\ to 



within additive accuracy ±^/16. Here each h' is a function of the form h'{x) = • "Yliec "^i^i^))' 
where the Vi's are integers whose absolute values sum to 0(1/.^^). 



We note that Theorem 1241 is not explicitly stated in the above form in |TTV08] : in particular, 
neither the time complexity of the algorithm nor the fact that it suffices for the algorithm to be 
given "noisy" estimates ai of the values Fix,~^fi[f{x)i{x)] is explicitly stated in |TTV08] . So for the 
sake of completeness, in the following we state the algorithm in full (see Figure E]) and sketch a 
proof of correctness of this algorithm using results that are explicitly proved in [TTVOBj . 



Proof of Theorem 24 It is clear from the description of the algorithm that (if and) when the 
algorithm Boosting-TTV terminates, the output h satisfies property (i) and has the form h{x) = 
-f'lCl " X]fe£ ^^^(■^)) '^here each W£ is an integer. It remains to bound the number of iterations 
(which gives a bound on the sum of magnitudes of W£s) and indeed to show that the algorithm 
terminates at all. 

Towards this, we recah Claim 3.4 in |TTV08j states the following: 

Claim 25. For all x G supp{^) and allt > 1, we have X]j=i fj{^)'if{^)~^j~ii^)) — (4/7) + (7^)/2- 



21 



Boosting-TTV 
Parameters: 

L 

iae)eec 



positive real number 

samplable distribution over finite domain X 

finite list of functions such that all ^ G £ maps X to {—1,1}. 

list of real numbers with the promise that some f : X ^ [—1,1] has 

|E,^^[/(a;)^(x)] - ail < ^/16 for all i G C. 



Output: 

An LBF h{x) = Pi{J2eec^<^^i^))^ "^^ere wt G Z, such that i:^^f,[h{x)£{x)] - f{x)e{x)\ < C for ah 

Algorithm: 

1. Let £" = {£:£eC or -i G £}. Fix 7 = ^/2. 

2. Let ho '= 0. Set t = 0. 

3. For each ie C, find a^.f G K such that \'E^^f,[ht{x)e{x)] ~ ae^t\ < C/16. 

4. If \a£ — ae_t\ < 7 for all £ € C, then stop and output ht- Otherwise, fix £ to be any element of 
C such that ja^ — a^^tj > 7. 

• If a£ — tti^t > 7 then set ft+i ==^^ else set ft+i —7. Note that ft+i G 

• Define ht+i as /it+i(a:;) =^-Pi(7(I]j=i fji^)))- 

5. Set i = t + 1 and go to Step 3. 



Figure 1: Boosting based algorithm from |TTV08j 

We now show how this immediately gives Theorem [24l Fix any j > 0, and suppose without 
loss of generality that — aij > 7. We have that 

\E^^^[fj+i{x)hj{x)] - aijl < ^/16 and hence Er,^^,[fj+i{x)hj{x)] < a^j + ^/16, 

and similarly 

|Ea;~^[/j+i(x)/(x)] - ai\ < ^/16 and hence Ea.^^.[/j+i(x)/(x)] > - ^/16. 
Combining these inequalities with ai — agj > 7 = ^/2, we conclude that 

E,^^[/j+i(x)(/(x) - hj{x))] > 3^/8. 
Putting this together with Claim \25\ we get that 

^ < ^E..^[/,(x)(/(x) - h,^^{x))] < ^ + ^. 

Since 7 = ^/2, this means that if the algorithm runs for t time steps, then 8/^ > {^t)/8, which 
implies that t < 64/^^. This concludes the proof. □ 
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6 Our Main Results 



In this section we combine ingredients from the previous subsections and prove our main results, 
Theorems and [571 

Our first main result gives an algorithm that works if any monotone increasing r/-reasonable 
LTF has approximately the right Shapley values: 

Theorem 26. There is an algorithm IS (for Inverse-Shapley) with the following properties. IS is 
given as input an accuracy parameter e > 0, a confidence parameter 6 > 0, and n real values 
a(l),... ,a{n); its output is a pair v G M",^ G M. Its running time is poly(n, 2P°'y(^/'^), log(l/(5)). 
The performance guarantees of IS are the following: 

1. Suppose there is a monotone increasing rj-reasonable LTF f{x) such that dshapicy(O) /) < 
l/poly(n, 2P°^5'(-'^/'^)). Then with probability 1 — 5 algorithm IS outputs v G M", ^ G M which 
are such that the LTF h{x) = sign(w ■ x — 9) has (ishapiey(/> h) ^ £• 

2. For any input vector (a(l), . . . , a(n)), the probability that IS outputs v G M", ^ G M such that 
the LTF h{x) = sign(u ■ x — 9) has (ishapioy(/) h) > e is at most 6. 

Proof. We first note that we may assume e > for a constant c > of our choosing, for if 
e < n"^ then the claimed running time is we can easily enumerate all 

LTFs over n variables (by trying all weight vectors with integer weights at most n""; this suffices by 
|MTT61j ) and compute their Shapley values exactly, and thus solve the problem. So for the rest 
of the proof we assume that e > n~'^. 

It will be obvious from the description of IS that property (2) above is satisfied, so the main 
job is to establish (1). Before giving the formal proof we first describe an algorithm and analysis 
achieving (1) for an idealized version of the problem. We then describe the actual algorithm and 
its analysis (which build on the idealized version). 

Recall that the algorithm is given as input e, S and a(l), . . . , a(n) that satisfy (ishapiey(a) /) ^ 
l/poly(n,2P°'y(iA)) for some monotone increasing ry-reasonable LTF /. The idealized version of 
the problem is the following: we assume that the algorithm is also given the two real values /*(0), 
Sr=i It is also helpful to note that since / is monotone and ry-reasonable (and hence is 

not a constant function), it must be the case that /(I) = 1 and /(—I) = — 1- 

The algorithm for this idealized version is as follows: first, using Lemma [HI the values /(i), 
i = !,...,?! are converted into values a*{i) which are approximations for the values f*{i). Each 
a*{i) satisfies \a*{i) - f*{i)\ < l/poly(n, 2'^(P°iy(i/^))). The algorithm sets a*(0) to /*(0). Next, 
the algorithm runs Boosting-TTV with the following input: the family C of Boolean functions is 
{l,xi, . . . ,Xn}', the values a*(0), . . . , a*(n) comprise the list of real values; /x is the distribution; and 
the parameter ^ is set to l/poly(n, 2^°^^^^^''^). (We note that each execution of Step 3 of Boosting- 
TTV, namely finding values that closely estimate 'Eixr^^[ht{x)xi] as required, is easily achieved using 
a standard sampling scheme; for completeness in Appendix [B] we describe a procedure Estimate- 
Correlation that can be used to do all the required estimations with overall failure probability at 
most 6.) Boosting-TTV outputs an LBF h{x) = Pi{v ■ x — 9); the output of our overall algorithm 
is the LTF h'{x) = sign(t; ■ x — 9). 

Let us analyze this algorithm for the idealized scenario. By Theorem 124^ the output function h 
that is produced by Boosting-TTV is an LBF h{x) = Pi{v-x—9) that satisfies ^jYl]=oi^*U) ~ f*U))'^ = 

l/poly(n,2P°'y(i/^)). Given this. Lemma [El implies that dFouricr(/, /i) < P=^ l/poly(n, 2P°'y(i/^)). 

At this point, we have established that /i is a bounded function that has dFouricr{f,h) < 
l/poly(n, 2P°'y(^/'^)). We would like to apply Lemma [15] and thereby assert that the £i distance 
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between / and h (with respect to /x) is small. To see that we can do this, we first note that 
since / is a monotone increasing ry-reasonable LTF, by Theorem [3] it has a representation as 
f{x) = sign(ii) ■ X + wq) whose weights satisfy the properties claimed in that theorem; in par- 
ticular, for any choice of C > 0, after rescaling all the weights, the largest-magnitude weight has 
magnitude 1, and the k =^ 0^^.;y(l/e^"''^^) largest-magnitude weights each have magnitude at least 

r '= l/{n • k'^^^^). (Note that since e > we indeed have k < n as required.) Given this. 
Theorem 1161 implies that the affine form L{x) = w ■ x + wq satisfies 

Pr,^^[|L(x)| < r] < K ='eV(5121og(n)), (13) 

i.e., it is (r, K)-anticoncentrated with k = e^/(512 log(n)). Thus we may indeed apply Lemma \T5\ 
and it gives us that 

B^r.f,[\f{x) - h{x)\] < " ^'^^ + 2K<eV(1281ogn). (14) 

Now let h' : {-1, 1}" {-1, 1} be the LTF defined as h'{x) = sign(t; ■ x - 6) (recall that h is 
the LBF Pi{v ■ x — 0)). Since / is a {—1, l}-valued function, it is clear that for every input x in 
the support of /i, the contribution of x to Pr3;^^[/(x) 7^ h'{x)\ is at most twice its contribution to 
Ea;~^t[|/(a;) — /j(a;)|]. Thus we have that Pr3;^^[/(x) / h' {x)] < e^/(641og?i). We may now apply 
Fact[7]to obtain that dFouricr(/i ^') < e/ (4-v/log n). Finally, Lemma [12] gives that 

4hapicy(/, h') < 4/V^ + ^/X{^ ■ e/(47l^) < e/2. 

So indeed the LTF h'{x) = sign(?; ■ x — 9) satisfies dshapiey(/) h') < e/2 as desired. 

Now we turn from the idealized scenario to actually prove Theorem 126^ where we are not given 
the values of /*(0) and EILi /*(^)/"- To get around this, we note that /*(0), Y.7=if*i^/n S 
[— 1, 1]. So the idea is that we will run the idealized algorithm repeatedly, trying "all" possibilities 
(up to some prescribed granularity) for /*(0) and for Y17=i /*(^)/'^- the end of each such run 
we have a "candidate" LTF h'; we use a simple procedure Shapley-Estimate (see Appendix |B]) to 
estimate dshapiey(/) ^') to within additive accuracy ±e/10, and we output any h' whose estimated 
value of dshapicy(/; /i') is at most 8e/10. 

We may run the idealized algorithm poly(n, 2?°^^^^/^)) times without changing its overall run- 
ning time (up to polynomial factors). Thus we can try a net of possible guesses for /*(0) and 
X^iLi /*(0/'^ which is such that one guess will be within ibl/poly(n, 2^°'^^"^/'^)) of the the correct 
values for both parameters. It is straightforward to verify that the analysis of the idealized scenario 
given above is sufficiently robust that when these "good" guesses are encountered, the algorithm 
will with high probability generate an LTF h' that has dshapiey(/; ^') ^ 6e/10. A straightforward 
analysis of running time and failure probability shows that properties (1) and (2) are achieved as 
desired, and Theorem 1261 is proved. □ 

For any monotone r^-reasonable target LTF /, Theorem [26] constructs an output LTF whose 
Shapley distance from / is at most e, but the running time is exponential in poly(l/e). We now 
show that if the target monotone ?7-reasonable LTF / has integer weights that are at most W, then 
we can construct an output LTF h with (ishaplcy(/) ^) ^ n~^/^ running in time poly(n, W^); this is 
a far faster running time than provided by Theorem [26] for such small e. (The "1/8" is chosen for 
convenience; it will be clear from the proof that any constant strictly less than 1/6 would suffice.) 
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Theorem 27. There is an algorithm ISBW (for Inverse-Shapley with Bounded Weights) with the 
following properties. ISBW is given as input a weight bound W S Z+, a confidence parameter 

6 > 0, and n real values a(l),... ,a{n); its output is a pair v £ R"',^ G M. Its running time is 
poly(n, VT, log(l/5)). The performance guarantees of ISBW are the following: 

1. Suppose there is a monotone increasing ij-reasonable LTF f{x) = sign(n ■ x — 9), where each 
Ui is an integer with \ui\ < W, such that (ishapiey(«) /) < l/poly(n, VF). Then with probability 
1 — 6 algorithm ISBW outputs v G M", ^ G M which are such that the LTF h{x) = sign{v-x — 9) 
has dsha.plcy{f,h) < n-V8, 

2. For any input vector (a(l),... ,a(n)), the probability that IS outputs v,9 such that the LTF 
h{x) = sign(t; ■ x — 9) has (ishaplcy(/i ^) > n~^/^ is at most 5. 

Proof. Let f{x) = sign(n ■ x — 9) he as described in the theorem statement. We may assume that 
each \ui\ > 1 (by scaling all the UiS and 9 by 2n and then replacing any zero- weight Uj with 1). 
Next we observe that for such an affine form u- x — 9, Theorem 1161 immediately yields the following 
corollary: 

Corollary 28. Let L{x) = X^^Li UiXi — 9 be a monotone increasing ij-reasonable affine form. 
Suppose that Ui > r for all i = 1, . . . ,n. Then for any ( > 0, we have 

Pr,.,||L(.)|<rl=0(^.-ji^.(i + i)), 

With this anti-concentration statement in hand, the proof of Theorem [27] closely follows the 
proof of Theorem [26j The algorithm runs Boosting-TTV with C, a*{i) and as before but now 

with ^ set to l/poly(n, VF). The LBF h that Boosting-TTV outputs satisfies d-ponncvif, h) < p '= 
l/poly(n, 1^). We apply Corollary [28] to the affine form L{x) '= • x — j^^^ and get that for 
r = l/poly(n, VF), we have 

Pr,^^[|L(x)| <r]<K =^V(10241ogn) (15) 
where now e '= n~^/®, in place of Equation (|13p . Applying Lemma [15] we get that 

4 TV 1 

E,^f^[\f{x)-h{x)\] < " "^^^ +4K<eV(1281ogn) 

r 

analogous to (|14p . The rest of the analysis goes through exactly as before, and we get that the 
LTF h'{x) = sign(i; ■ x — 9) satisfies (ishapicy(/) ^ e/2 as desired. The rest of the argument is 
unchanged so we do not repeat it. □ 

7 Conclusions and Future Work 

The problem of designing a weighted voting game that (exactly or approximately) achieves a desired 
set of Shapley values has received considerable attention in the social choice literature, where 
several heuristics and exponential time algorithms have been proposed. This work provides the 
first provably correct efficient approximation algorithm for this problem. 

An obvious open problem is to improve the dependence on the error parameter e in the running 
time. Since the running time of our algorithm is of the form a(e) • for a fixed universal constant 
c, the algorithm is an Efficient Polynomial Time Approximation Scheme (EFTAS). Is there a 
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Fully Polynomial Time Approximation Scheme (FPTAS), i.e., an algorithm with running time 
poly(n, 1/e)? 

It would also be interesting to characterize the complexity of the exact problem (i.e., that of 
designing a weighted voting game that exactly achieves a given set of Shapley values, or deciding 
that no such game exists). We conjecture that the exact problem is intractable, namely (jP-hard. 

Acknowledgement. We would like to thank Edith Elkind for asking the question about Shapley 
values and for useful pointers to the literature. We thank Christos Papadimitriou for insightful 
conversations. 
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Appendix 



A LTF representations with "nice" weights 

In this section, we prove Theorem [3l This theorem essentially says that given any r/-reasonable 
LTF, there is an equivalent representation of the LTF which is also r/-reasonable and is such that 
the weights of the linear form (when arranged in decreasing order of magnitude) decrease somewhat 
"smoothly." For convenience we recall the exact statement of the theorem: 

Theorem [31 Let f : { — 1, 1}" — ?■ { — 1, 1} be an rj-reasonable LTF and k € [2,n]. There exists a 
representation of f as f{x) = sign(t'o + Sr=i^«-^«) s^^c/i that (after reordering coordinates so that 
condition (i) below holds) we have: (i) \vi\ > i G [n — 1]; (ii) \vq\ < (1 — ??) Xli^i 1^*1'' "'^'^ 

(Hi) for all i G [0, — 1] we have \vi\ < {2/rj) ■ ^/n ■ fca • ak, where '= ^jYlj>k 

Proof of Theorem\^ The proof proceeds along similar lines as the proof of Lemma 5.1 from 
[OSllj (itself an adaptation of the argument of Muroga et. al. from [MTT61] ) with some crucial 
modifications. 

Since / is 77-reasonable, there exists a representation as f{x) = sign(tt;o + Yll)=i '^i^i) (where 
we assume w.l.o.g. that \wi\ > \wi+i\ for all i G [n — 1]) such that \wq\ < (1 — 'n)Yll)=i Of 
course, this representation may not satisfy condition (iii) of the theorem statement. We proceed 
to construct the desired alternate representation as follows: First, we set Vi = Wi for all i > k. We 
then set up a feasible linear program CV with variables uq, . . . , and argue that there exists a 
feasible solution to CV with the desired properties. 

Let h : {±1}^^"'^ — t- M denote the affine form h{x) = + S^=i "^j^j- We consider the following 
linear system S of 2^'"-*^ equations in k unknowns uq, . . . , Uk-i- For each x G {±1}'^"-'^ we include 
the equation 

fc-i 

i=l 

It is clear that the system S is satisfiable, since (uq, . . . , u^-i) = {wq, . . . , Wk-i) is a solution. 

We now relax the above linear system into the linear program CV (over the same variables) as 

follows: Let C =^ ^/no}^. Our linear program has the following constraints: 

• For each x G {±1}^'"-'^ we include the (in)equality: 

> C if h(x) > C, 

= h{x) i{\h{x)\<C, (16) 
< -C if h{x) < -C. 

• For each i G [0, A; — 1], we add the constraints sign(nj) = sign{wi). Since the tUj's are known, 
these are linear constraints, i.e., constraints like ui < 0, M2 > 0, etc. 

• We also add the constraints of the form > |iij+i| for 1 < i < k — 2 and also |nfc_i| > 
Note that these constraints are equivalent to the linear constraints: Ui ■ sign{wi) > Uj+i • 
sign{wi+i) and sign(t(;fc„i) • Uk-i > \wk\. 

• We let q = [1/r?] and rj' = 1/q. Clearly, ?/ < r]. We now add the constraint \uq\ < 
(1 — r]') • ( X^^Z| \uj\ + Yl]=k I'^il ) • Note that this is also a linear constraint over the variables 



fc-i 

i=l 
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no, ui, . . . , Uk-i- Indeed, it can be equivalently written as: 

fe— 1 n 

sign{wo) -uo- (1- r]') Yl sign{wj) ■ Uj < (1 - r]') \wj\. 

j=l j=k 

Note that the RHS is strictly bounded from above by C, since 

n 

Y {wjl < Vn - k + 1 ■ Gk < VncTk, 

j=k 

where the first inequality is Cauchy-Schwarz and the second uses the fact that k >2. 

We observe that the above linear program is feasible. Indeed, it is straightforward to verify 
that all the constraints are satisfied by the vector (wq, . . . ,Wk-i)- In particular, the last constraint 

is satisfied because \wo\ < {I — r]) • (^Yl'j=i l^jl + Sj=fe l^il) hence a fortiori, {wqI < (1 — ??') • 

{Ej=l\wj\ + E]=k\wj\)- 

Claim 29. Let {vq, . . . ,Vk-i) be any feasible solution to CP and consider the LTF 

fe— 1 n 

f{x) = sign{vQ + Yj VjXj + Y WjXj). 

j=l j=k 

Then f'{x) = f{x) for all x G {-1, 1}". 
Proof. Given x G {—1,1}", we have 

fe-i 

h{x) = h{xi, . . .,Xk-i) =Wo+Y WjXj; 

i=i 

Let us also define 

fe-i 

h!{x) = h'{xi, . . . = vo+ Y '^j^j 

i=i 

t{x) = Y '^j^j 

Then, we have f{x) = sign {h{x) + t{x)) and f'{x) = sign {h'{x) + t{x)). Now, if x G {—1, 1}" is an 
input such that < C, then we have h'{x) = h{x) by construction, and hence f{x) = f'{x). If 

X G {—1, 1}" is such that |/j(x)| > C, then by construction we also have that > C . Also, note 

that h{x) and h'{x) always have the same sign. Hence, in order for / and /' to disagree on x, it must 
be the case that \t{x)\ > C. But this is not possible, since \t{x)\ < Y'^=k — V'^ — 1-0"^ < C. 
This completes the proof of the claim. □ 

We are almost done, except that we need to choose a solution {vq, . . . ,Vk-i) to CP satisfying 
property (iii) in the statement of the theorem. The next claim ensures that this can always be 
achieved. 

Claim 30. There is a feasible solution v = {vq, . . . to the CP which satisfies property (iii) 

in the statement of the theorem. 

Proof. We select a feasible solution v = {vq, . . . , Vk-i) to the CP that maximizes the number of tight 
inequalities (i.e., satisfied with equality). If more than one feasible solutions satisfy this property, 
we choose one arbitrarily. We require the following fact from jMTTGl] (a proof can be found in 
|Has94l[DS09] ). 
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Fact 31. There exists a linear system A ■ v = b that uniquely specifies the vector v. The rows of 
(A, h) correspond to rows of the constraint matrix of CP and the corresponding RHS respectively. 

At this point, we use Cramer's rule to complete the argument. In particular, note that Vi = 
(iet{Ai) / Aet{A) where Ai is the matrix obtained by replacing the i-th column of A by h. In 
particular, we want to give an upper bound on the magnitude of vi, we do this by showing a lower 
bound on | det(74)| and an upper bound on | det(Aj)|. 

We start by showing that |det(A)| > r]' . First, since A is invertible, det(A) ^ 0. Now, note 
that all rows of A have entries in {—1,0, 1} except potentially one "special" row which has entries 
from the set {±1, ±(1 — ??')}• If the special row does not appear, it is clear that | det(74)| > 1, since 
it is not zero and the entries of A are all integers. If, on the other hand, the special row appears, 
simply expanding det(A) along that row gives that det(^) = a ■ {1 — rj') -\- b where a,b £ Z. As 
?]' = 1/g for some q £ Z and det{A) / 0, we deduce that | det(74)| > rj' , as desired. 

We bound | det(Aj)| from above by recalling the following fact. 

Fact 32. (Hadamard's inequality) If A G M"^" and vi,...,Vn € are the columns of A, then 
|det(A)|<n"=ilbill2. 

Now, observe that for all i, the z-th column of Ai (i.e., vector b) has all its entries bounded by 
C, hence \\vi\\2 < C\fk. All other columns have entries bounded from above by 1 and thus for j 7^ i, 
ll^^jlb < Vfe. Therefore, det(^j) < C ■ k^/"^. Thus, we conclude that \vi\ < (C • k^l'^')lr( . Further, as 
(1/7?') = [(1/r/)] < {2/r]), we get \vi\ < 2C ■ fc'^/^/r/, completing the proof of the claim. □ 

The proof of Theorem [3] is now complete. □ 

B Estimating correlations and Shapley values 

Our algorithms need to estimate expectations of the form f*{i) = ^x'^ij,[fix)xi] and to estimate 
Shapley values f{i), where / : {—1,1}"' — )• [—1,1] is an explicitly given function (an LBF). This 
is quite straightforward using standard techniques (see e.g. [BMR"*" lO] ) but for completeness we 
briefly state and prove the estimation guarantees that we will need. 

Estimating correlations with variables. We will use the following: 

Proposition 33. There is a procedure Estimate-Correlation with the following properties: The pro- 
cedure is given oracle access to a function f : {—1, 1}"" — )■ [—1, 1], a desired accuracy parameter 7, 
and a desired failure probability 6. The procedure makes 0{nlog{n/ 5) /j'^) oracle calls to f and runs 
in time 0{n'^ log(n/(5)/7^) (counting each oracle call to f as taking one time step). With probability 
1 — 6 it outputs a list of numbers a*(0), a*(l), . . . , a*(n) such that \a*{j) — f*{j)\ < 7/\/n + 1 for 
all j = 0, . . . , n. (Recall that f*{j) equals Ea;^^[/(x)xj], where xq = 1). 

Proof. The procedure works simply by empirically estimating all the values f*{j) = 'Eix^fi[f{x)xj], 
j = 0, . . . , n, using a single sample of m independent draws from fi. Since the random variable 
{f{x)xj))xr^^ is bounded by 1 in absolute value, a straightforward Chernoff bound gives that for m = 
0(n log (n/5)/7^), each estimate a*{j) of /*(j) is accurate to within an additive ±7/^/"' + 1 with 
failure probability at most 6/{n -\-\). A union bound over j = 0, . . . , n finishes the argument. □ 

Estimating Shapley values. This is equally straightforward: 



30 



Proposition 34. There is a procedure Estimate-Shapley with the following properties: The procedure 
is given oracle access to a function f : {—1, 1}" — )• [—1, 1], a desired accuracy parameter 7, and a 
desired failure probability 5. The procedure makes 0{nlog{n/6)/^'^) oracle calls to f and runs in 
time 0(n^ log(n/(5)/7^) (counting each oracle call to f as taking one time step). With probability 
1 — 5 it outputs a list of numbers a(l), . . . , a(n) such that dshaplcy (o^j /) ^ 7- 

Proof. The procedure empirically estimates each /(j), j = l,...,n, to additive accmacy ^/y/n 
using Equation ([T|). This is done by generating a uniform random vr ~ S,„ and then, for each 
i = constructing the two inputs x^{7r,i) and x{7T,i) and calling the oracle for / twice 

to compute /(x+(7r,z)) — f{x{TT,i)). Since \ f{x~^{7r,i)) — f{x{TT,i))\ < 2 always, a sample of m = 
0(nlog(n/(5)/7^) permutations suffices to estimate all the f{i) values to additive accuracy ±7/-^ 
with total failure probability at most 6. If each estimate d{i) is additively accurate to within 
±7/\/^) then (ishapicy(a, /) < 7 as desired. □ 
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